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Abstract 

Background: Various types of framing can influence risk perceptions, which may have an impact on treatment 
decisions and adherence. One way of framing is the use of verbal terms in communicating the probabilities of 
treatment effects. We systematically reviewed the comparative effects of words versus numbers in communicating 
the probability of adverse effects to consumers in written health information. 

Methods: Nine electronic databases were searched up to November 2012. Teams of two reviewers independently 
assessed studies. Inclusion criteria: randomised controlled trials; verbal versus numerical presentation; context: 
written consumer health information. 

Results: Ten trials were included. Participants perceived probabilities presented in verbal terms as higher than in 
numeric terms: commonly used verbal descriptors systematically led to an overestimation of the absolute risk of 
adverse effects (Range of means: 3% - 54%). Numbers also led to an overestimation of probabilities, but the 
overestimation was smaller (2% - 20%). The difference in means ranged from 3.8% to 45.9%, with all but one 
comparison showing significant results. Use of numbers increased satisfaction with the information (MD: 0.48 [CI: 
0.32 to 0.63], p < 0.00001, 1^ = 0%) and likelihood of medication use (MD for very common side effects: 1 .45 [CI: 0.78 
to 2.1 1 ], p = 0.0001 , 1^ = 68%; MD for common side effects: 0.90 [CI: 0.61 to 1 .1 9], p < 0.00001 , P = 1 %; MD for rare 
side effects: 0.39 [0.02 to 0.76], p = 0.04, l^ = not applicable). Outcomes were measured on a 6-point Likert scale, 
suggesting small to moderate effects. 

Conclusions: Verbal descriptors including "common", "uncommon" and "rare" lead to an overestimation of the 
probability of adverse effects compared to numerical information, if used as previously suggested by the European 
Commission. Numbers result in more accurate estimates and increase satisfaction and likelihood of medication use. 
Our review suggests that providers of consumer health information should quantify treatment effects numerically. 
Future research should focus on the impact of personal and contextual factors, use representative samples or be 
conducted in real life settings, measure behavioral outcomes and address whether benefit information can be 
described verbally. 
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Background 

Ideally, patient decisions for and against medical treat- 
ments are made in the presence of knowledge of the best 
available evidence for the benefits and harms of these 
treatments. Personal preferences and values can influence 
treatment decisions and may - legitimately - lead people 
to make choices which are not necessarily in line with the 
evidence. There are, however, some cognitive biases that 
may interfere with treatment. In particular, various types 
of data framing can influence risk perceptions [1]. 

Poorly framed information on the risk of adverse ef- 
fects of drugs or other medical interventions may cause 
misinterpretation of the risks of harms. This may have 
an impact on treatment decisions and might also affect 
medication adherence. The 1995 contraceptive pill 
scare in the UK highlights the importance of helping 
doctors and patients understand risk information: 
media reports and "Dear Doctor" letters reported that 
the third-generation contraceptive pills increased the 
(relative) risk of blood clots by 100%, which caused 
many women to stop taking the pill and led to many 
unwanted pregnancies and abortions - although the 
absolute risk increase was as small as 0.014% [2]. 

One way of framing information is the use of words in 
communicating the probabilities of treatment effects. 
A prominent example for a nomenclature of words 
used to communicate frequencies of adverse effects is 
the one proposed by the European Commission in 
their 1998 guidelines on the readability of package leaf- 
lets and summary of product characteristics from the 
European Medical Association [3,4]. Table 1 shows the 
wording suggested in these guidelines. 

Several studies have compared the use of verbal terms 
versus numbers for communicating the frequency of 
adverse drug effects. However, to our knowledge no 
systematic review on the comparative effects of verbal 
versus numerical presentations of the frequency of ad- 
verse effects has been conducted. Risk communication 
has become a vast field which is difficult to keep up 
with. Thus, current recommendations on risk commu- 
nication are often based on expert consensus or a se- 
lective review of the literature. For example, both the 
International Patient Decision Aid Standards (IPDAS) 

Table 1 European commission nomenclature for 
communicating frequency of adverse effects of drugs 



Description Frequency interval 



Very common 


(>1/10) 


Common 


(>1/100 to <1/10) 


Uncommon 


(> 1/1 000 to < 1/1 00) 


Rare 


(> 1/1 0000 to < 1/1 000) 


Very rare 


(< 1/1 0000) 


Not known 


cannot be estimated from the available data 



and the FDA's user's guide on communicating risks and 
benefits currently do not cite many of the studies we 
identified in our preliminary searches. The aim of this 
systematic review is to improve the evidence base of 
risk communication strategies by gathering and synthesiz- 
ing the results from studies that examined different terms, 
scenarios and probabilities. 

Methods 

Inclusion criteria 

We included studies examining the effects of words ver- 
sus numbers in communicating harms of treatments to 
consumers in written health information. Our inclusion 
criteria were: (1) study design: randomized controlled 
trials (RCTs); (2) outcomes: interpretation of probabil- 
ity, comprehension, recall, satisfaction, impact on deci- 
sion, likelihood of treatment utilization, adherence and 
psychological outcomes (e.g. anxiety); (3) context: treat- 
ment effects were communicated through written health 
information only and (4) language: studies published in 
English or German. 

Data sources and search methods 

We searched MEDLINE, Embase, PsycINFO, CINAHL, 
ERIC, DARE, the CDSR, CENTRAL and the Campbell 
Library. Searches were developed and conducted by an 
information specialist using a combination of MeSH- 
terms, free text and validated search filters for specific 
study designs, where available. See Additional file 1 for 
the search strategy used to identify relevant studies in 
MEDLINE. This was adapted as required to other 
databases. Searches were conducted up to the 9th of 
November 2012. Titles and abstract of search results 
were assessed for eligibility independently by three re- 
viewers in pairs. Full texts of potentially relevant studies 
were retrieved and assessed for eligibility independently by 
two reviewers. Reference lists of articles eligible for inclu- 
sion were screened for further potentially relevant studies. 

Data extraction and risk of bias assessment 

Data were extracted into standardized extraction sheets 
and double checked in pairs by three reviewers. These 
included data on study design, risk of bias items, popu- 
lation characteristics, study setting, study intervention 
and results for the relevant outcomes (means and stand- 
ard deviations). In studies that only reported p-values, 
t-values or confidence intervals, we derived standard de- 
viations from these statistics using the methods described 
in Chapter 7 of the Cochrane Handbook for Systematic 
Reviews [5]. 

Risk of bias was assessed for RCTs by random sequence 
generation, allocation concealment, completeness of follow- 
up and selective reporting bias. Judgements were made 
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in accordance with the guideUnes for the Cochrane risk 
of bias tool [6]. 

Data synthesis and analysis 

Data were entered into RevMan 5 and pooled. Mean dif- 
ferences (MD) and their corresponding 95% confidence 
intervals (CI) were calculated for outcomes that were 
measured on scales of considerable similarity. Otherwise 
standardised mean differences were calculated. Meta- 
analyses were conducted using random-effects models as 
the underlying rationale of random-effects models may 
be more appropriate when pooling heterogeneous data, 
while fixed and random-effects models produce the same 
result if data are homogenous. A downside of random- 
effects models is that more weight is given to small studies 
which may have a higher risk of bias (small study bias), 
but this was not an issue in our review. Heterogeneity was 
measured using Chi -tests and the I statistic. If hetero- 
geneity was detected, subgroup analyses were conducted 
to explore reasons for heterogeneity. Subgroups were 
planned a priori for age, gender, socioeconomic status, 
type of illness (mild or severe), size of absolute effect and 
severity of side effects. Where statistical heterogeneity 
remained, but there was strong contextual homogeneity, 
we opted in favour of pooling the data into meta-analyses, 
because of their additional informational value and the 
problems associated with narrative or pseudo-quantitative 
interpretation of results [7]. However, in these cases we 
did not pool results across subgroups. 

Some studies had three comparison groups: two stud- 
ies compared a verbal, percentage and natural frequency 
presentation; one study compared a verbal, numerical 
and combined verbal/numerical presentation [8]. In this 
case we used data from both comparisons in our ana- 
lyses and divided the number of participants in the 
verbal group by two in order not to artificially inflate the 
statistical power of these studies in the meta-analyses. In 
two studies participants received two scenarios with dif- 
ferent adverse effects. In cases where both scenarios 
were relevant to the same meta-analysis, we averaged 
the results across the two scenarios. The standard devia- 
tions for these comparisons were recalculated to account 
for statistical dependence assuming a correlation of 0.5 
(sensitivity analyses with correlations of 0.1 and 0.9 pro- 
duced similar results). 

Results 

Search results 

Figure 1 shows a flow diagram depicting the study selec- 
tion process in accordance with the PRISMA statement 
[9]. Our searches yielded 1201 potentially relevant articles. 
Seven articles including ten studies remained eligible for 
inclusion after applying the inclusion criteria [8,10-15]. 



Description of studies 

All studies were randomized controlled trials, many of 
which used a factorial design. Some studies were reported 
in more than one publication. All studies randomized 
participants to short information leaflets on drugs for a 
particular condition, which only differed in whether the 
information on the frequency of the adverse effects of 
the drug were presented verbally or numerically. One 
study examined a combination of a verbal and numer- 
ical description, as it is currently included in the 2009 
European Commission Guideline on the readability of 
package leaflets [16]. The interventions and outcomes 
of the studies were very similar and mainly differed with 
respect to the conditions and drugs that were used in 
the scenarios as well as the frequency and the severity 
of the side effects. The studies included five outcomes 
of interest to our review: estimation of probabilities 
(in percentages), likelihood of occurrence, satisfaction, 
intention to take or continue to take the medicine and 
the impact of the information on the decision. The last 
four outcomes were all measured as one item on a 6-point 
Likert scale. All outcomes were measured shortly after 
distribution of the information leaflets, and none of the 
studies had a follow-up. In many cases the participants 
received information on more than one adverse effect, 
resulting in a higher number of comparisons than stud- 
ies for the outcome estimation of probability. 

In all but one study participants were recruited from 
the general population or via a cancer website and con- 
fronted with a hypothetical scenario. The studies were all 
conducted by two groups of authors from the UK, who 
were interested in evaluating the effects of the nomencla- 
ture used in drug package inserts in the European Union. 
Thus, the verbal descriptors that were studied in the trials 
were: very common, common, uncommon, rare and very 
rare. See Additional file 2 for detailed characteristics of the 
included studies with additional results from individual 
studies regarding effect modifiers. 

Risk of bias 

Methods of sequence generation, allocation concealment 
and information on incomplete outcome data were fre- 
quently not reported. Thus, none of the studies was 
formally rated low on all risk of bias items (Table 2). 
Nevertheless we consider the overall risk of bias to be 
low. For one thing, there was a large overlap in the 
group of authors, suggesting that the methods used were 
likely to be appropriate despite not being fully reported 
in each study - especially considering that unconcealed 
allocation was explicitly acknowledged in one study. For 
another thing, the design of the studies was rather sim- 
ple. Participants completed questionnaires immediately 
after reading the information in the presence of the 
researchers. Therefore, it can be assumed that missing 
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Records identified through database searching (n = 1769) 



Additionai records identified through other sources 
(n = 21) 



Total number of citations retrieved (n = 1790) 



Duplicates 
(n = 589) 



Records after electronic removal of duplicates (n = 1201) 



Records excluded based on title/abstract (n =1 1 74) 



Full-text articles assessed for eligibility (n = 27) 



Not conducted in the context of written patient information 
(n = 3) 

No verbal comparison group 
(n = 10) 

Study with physicians or nurses (n = 5) 
Used a pictorial (n = 1) 
Secondary publication (n = 1) 



Articles/Studies included in systematic review (n = 7/10) 



Figure 1 Study selection process. Flow-chart showing the study selection procedure according to PRISMA reporting guidelines. 



data were not an issue even if this was not explicitly this study did not alter the results. There were no signs of 

stated for each item in all publications. selective reporting. 

One study used an unconcealed allocation. However, the 

authors of the study argued that this was unlikely to bias Effects of Interventions 

the results, because it seems unlikely that the researcher Estimation of probabilities 

could be able to anticipate the participants' response to There were 19 comparisons from 10 studies with 2145 

verbal or numerical information. Furthermore, excluding observations for the outcome estimation of probabilities. 
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Table 2 Risk of bias of included studies 



Study 


Random sequence generation 


Allocation concealment 


Incomplete outcome data 


Selective reporting 


Berry 2002 Study 1 [1 0] 


Unclear 


Unclear 


Low 


Low 


Berry 2002 Study 2 [1 0] 


Unclear 


Unclear 


Low 


Low 


Berry 2003 Study 1 [11] 


Unclear 


Unclear 


Low 


Low 


Berry 2003 Study 2 [11] 


Unclear 


Unclear 


Low 


Low 


Berry 2004 [1 2] 


Unclear 


Unclear 


Low 


Low 


Berry 2006 [1 3] 


Unclear 


Unclear 


Unclear 


Low 


Knapp 2004 [14] 


Unclear 


High 


Unclear 


Low 


Knapp 2009a [8] 


Low 


Low 


Unclear 


Low 


Knapp 2009b Study 1 [8] 


Low 


Low 


Unclear 


Low 


Knapp 2009 Study 2 [15] 


Low 


Low 


Unclear 


Low 



The verbal descriptors used for communicating frequen- 
cies of adverse effects systematically led to an overesti- 
mation of the probability of adverse effects compared 
to a numerical presentation (MD for very common side 
effects: -31.54 [CI: -43.32 to -19.77], p < 0.00001, f = 
91%; MD for common side effects: -35.36 [CI: -39.92 
to -30.81], p < 0.00001, f = 48%;%; MD for uncommon 
side effects: -11.20 [-16.69 to -5.70], p< 0.0001, f = 
30%; IVID for rare side effects: -10.11 [-15.64 to -4.58], 
p = 0.0003, f = 58%). The absolute magnitude of the over- 
estimation was larger in the very common and common 
subgroup than in the uncommon and rare subgroup, as it 
would be expected (Figure 2). Subgroup analyses by fre- 
quency did not fully explain the heterogeneity. The differ- 
ences in frequencies used in the studies are likely to 
contribute to this heterogeneity (see Additional file 3). 

Interestingly, even participants who received a probabil- 
ity estimate of the frequency of the adverse effects often 
overestimated these values. Only between 9% and 50% of 
the participants in the numerical groups gave a correct 
probability for the adverse effects (see Additional file 3). 
However, this was not always reported. Furthermore, the 
variability in responses between participants was large, 
which is indicated by large standard deviations and wide 
ranges. 

See Additional file 3 for a detailed table of the results 
of the comparisons from each study by verbal descriptor 
and type and frequency of adverse effect together with 
the results of the significance tests as they were reported 
in the primary studies. 

Likelihood of occurrence 

Likelihood of occurrence as measured on a Likert scale 
was considered in 10 comparisons with 892 observa- 
tions. Participants who received a verbal presentation of 
the frequency of adverse effects thought they were more 
likely to occur than those who received numerical infor- 
mation (MD for very common side effects: 0.80 [CI: 0.24 
to 1.37], p = 0.006, 1^ = 85%; MD for common side 



effects: 1.39 [CI: 1.05 to 1.74], p< 0.00001, 1^ = 0%; MD 
for rare side effects: 0.90 [0.30 to 1.50], p = 0.003, I^ = not 
applicable). We conducted a subgroup analysis by fre- 
quency of adverse effect, but this did not fully explain the 
large heterogeneity for this outcome (Figure 3). However, 
the heterogeneity can mainly be attributed to one trial 
[10], which showed a large difference and included 
postgraduate or undergraduate students in contrast to 
the other studies, which included participants from the 
general public. Excluding this trial from the analysis 
reduces the heterogeneity in the respective subgroup 
from 85% to 39%. 

One trial compared a numerical presentation with a 
combined format. Splitting this trial from the others we 
conducted a second, exploratory subgroup analysis on 
this outcome. This suggested that the verbal presentation 
may dilute the effects of a numerical presentation on this 
outcome (test for subgroup difference, p = 0.003, analysis 
not shown) [15]. 

Satisfaction 

Satisfaction with the information was measured in 12 
comparisons with 1228 observations. Satisfaction was 
consistently higher in groups that received a numerical 
description of the frequency of adverse effects compared 
to a verbal description (MD: 0.48 [CI: 0.32 to 0.63], 
p < 0.00001, f = 0%) (Figure 4). 

Likelihood of taking the medicine 

Data for the outcome likelihood of taking or continuing 
to take the drug in the scenario was available from 5 
comparisons with 780 observations. Participants who were 
presented with numbers, stated that they were or would 
be more likely to take or continue taking the drugs which 
were suggested to them (MD for very common side ef- 
fects: 1.45 [CI: 0.78 to 2.11], p < 0.0001, f = 68%; MD for 
common side effects: 0.90 [CI: 0.61 to 1.19], p< 0.00001, 
I^= 1%; MD for rare side effects: 0.39 [0.02 to 0.76], 
p = 0.04, I^ = not applicable) (Figure 5). There was a 



Buchter et al. BMC Medical Informatics and Decision Making 2014, 14:76 
http://www.biomedcentral.com/1472-6947/14/76 



Page 6 of 1 1 



numerical/combined 



verbal 



Mean Difference 



Mean Difference 



Study or Subgroup 




Mean 


SO 


Total 


Mean 


SD 


Total 


Weight 


IV, Random, 95% CI 


IV, Rando 


1.1.1 Very common 
























Betr^ 2002 Study 2 




20 


11.2 


56 


64.4 


20.2 


56 


21.0% 


-44.401 


50.45,-38.35) 




Berry 2003 Study 1 




23.4 


16.3 


60 


69.3 


20.5 


60 


20.8% 


-45.901 


52.53,-39.27) 


-m- 


Knapp 2009a 




55.7 


23 


52 


78.1 


23.9 


69 


20.0% 


-22.401 


30.82,-13.98) 


-m- 


Knapp 2009b Study 1 




52.2 


20.88 


54 


71.6 


22.23 


48 


19.9% 


-19.401 


27.90,-10.90) 


-m- 


Knapp 2009b Study 2 




14.8 


25.35 


41 


38.5 


30.96 


50 


18.3% 


-23.701 


35.27,-12.13) 


. 


Subtotal (95% CI) 








263 






281 


100.0% 


-31.54 [-43.32, -19.77] 




Heterogeneity; Tau* = 


162.31; Chi 


= 44.84, df 


= 4 (P 


< 0.00001);!= = 


91% 










Test for overall effect: 


Z= 


5.25 (P < 


0.00001) 


















1.1.2 Common 
























Berry 2003 Study 2 




9.5 


14.2 


90 


50.5 


24.4 


90 


20.9% 


-41.001 


46.83,-35.17) 




Berry 2004 




1 9.94 


22.12 


94 


56.61 


23.68 


94 


19.2% 


-36.671 


43.22,-30.12) 




Berry 2006 




1 8.86 


23.52 


48 


58.23 


22.89 


48 


13.7% 


-39.371 


48.85,-30.09) 




Knapp 2004 




8 1 


42.83 


30 


34 2 


42 83 


30 


3.9% 


-26.10 


1-47.77,-4.43) 




Knapp 2009a 




121 


18.5 


52 


49 4 


23 3 


69 


17.2% 


-37.301 


44.75,-29.35) 




Knapp 2009b Study 1 




29.6 


28.21 


54 


62.2 


18.52 


46 


13.8% 


-32.601 


41.83,-23.37) 




Knapp 2009b Study 2 




12.5 


23.44 


41 


34.2 


28.85 


50 


11.5% 


-21.701 


32.44,-10.96) 




Subtotal (95% CI) 








409 






427 


100.0% 


-35.36 [- 


39.92, -30.81] 


♦ 


Hptprnnpnpitv TaiJ~ = 


17 


02;Ctli= = 


= 1 1,55, df= 


6 fP = 


0.07); 1 


= =48% 












Test for overall effect: 


Z = 


15.22(P 


< 0.00001) 


















1.1.3 Uncommon 
























Berry 2006 




6.98 


11.87 


48 


22.9 


22.03 


48 


38.8% 


-15.92 


1-23.00,-8.84) 


-■- 


Knapp 2009a 




12.2 


22.3 


52 


21.7 


20.2 


69 


34.7% 


-9.50 


1-17.21,-1.79) 


-»- 


Knapp 2009b Study 2 




10.7 


21.5436 


41 


17.2 


23.58 


50 


26.5% 


-6.50 1-15.78,2.78) 




Subtotal (95% CI) 








141 






167 


100.0% 


-11.20 


[-16.69,-5.70] 


♦ 


Heterogeneity: Tau'= 


7.1 


8;Chi= = 


2.87, df = 2 


(P = 0 


24);l= = 


30% 












Test for overall effect: 


Z = 


4.00 (P = 


0.0001) 


















1.1.4 Rare 
























Berry 2003 Study 2 




6.8 


15.4 


90 


21.5 


17.7 


90 


34.2% 


-14.70 


1-19.55,-9.85) 


• 


Berry 2006 




3.94 


10.55 


48 


12.13 


18.14 


48 


30.2% 


-8.19 


1-14.13,-2.25) 


* 


Knapp 2004 




2.1 


29.92 


30 


13 


29.92 


30 


10.4% 


-15.90 


1-31.04,-0.76) 




Knapp 2009a 




11.1 


20.2 


52 


14.9 


21.2 


69 


25.2% 


-3.80 1-11.23,3.63] 




Subtotal (95% CI) 








220 






237 


100.0% 


-10.11 


[-15.64, ^.58] 


♦ 


Heterogeneity: Tau== 


17 


18;Chi=: 


= 7.07, df= 


3(P = 


0.07): P 


= 58% 













Test for overall effect: Z= 3.58 (P= 0.0003) 



Figure 2 Estimation of frequency. Meta-analysis on estimation of frequency of adverse effects. 



-100 -50 0 50 100 
Higher numerical Higherverbal 



significant effect for subgroup differences according 
to the frequency of the adverse effect (p = 0.01). Based on 
the EU nomenclature, this suggests that the larger the 
frequency of the adverse effect, the less likely it is that 
participants will take the drug, if they are presented 
with a verbal format. 

Impact of information on decision 

The impact of the information on the decision to take or 
continue to take the medication was measured in 7 com- 
parisons with 532 observations. Verbal presentations of 
adverse effects had a larger impact on the decision to 
take the drugs than numerical presentations (MD: 0.52 
[CI: 0.22 to 0.82], p = 0.0007, f = 0%) (Figure 6). There 
was a significant subgroup effect for the difference 
between a numerical and a combination of numerical 
and verbal presentation, suggesting that the verbal 



presentation may dilute the effects of a numerical 
presentation on this outcome (test for subgroup dif- 
ferences p = 0.02). However, this subgroup analysis is 
restricted to one study and was conducted post hoc, 
so the results should be interpreted with caution. 

Discussion 

This systematic review provides evidence that compared 
to numerical information verbal descriptors commonly 
used to communicate the frequencies of adverse effects 
in written health information including "common", 
"uncommon" and "rare" lead to an overestimation of the 
probability of adverse effects, when they are used as 
previously suggested in the Guidelines of the European 
Commission. 

It could be argued that other verbal terms are needed 
to describe frequencies. We are not aware of any studies 
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( ^ 

Verbal Numerical Mean Difference Mean Difference 



Study or Subgroup 


Mean SD Total 


Mean 


SD 


Total 






IV, Random, 95% CI 


1.1.1 Very common side effect (15 %) 
















Berry 2002 Study 2 


4.4 1.1 56 


2.5 


1 


56 


15.8% 


1.90 (1.51,2.29] 






Knapp 2009a 


4.3 1.4 34 


4.3 


1.4 


66 


14.5% 


0.00 [-0.58, 0.58] 






Knapp 2009a 


4.3 1.4 35 


3.6 


1.2 


52 


1 4.6% 


0.70 [0.1 3, 1 .27] 






Knapp 2009b Study 1 


3.48 1.51 25 


2.43 


1.21 


46 


1 3.6% 


1.05 (0.36, 1.74] 






Knapp 2009b Study 1 


3.48 1.51 25 


2.44 


1.33 


41 


13.4% 


1 .04 (0.32, 1.76] 






Knapp 2009b Study 2 


4.59 1.35 23 


4.3 


1.17 


54 


1 4.1 % 


o.iy 1-U.J4, u.y.ij 






Knapp 2009b Study 2 


4.59 1.35 23 


4.04 


1.14 


48 


14.0% 


0.55 [-0.09, 1.19] 






Subtotal (95% CI) 


221 






363 


100.0% 


0.80 [0.24, 1.37] 






Heterogeneity: Tau== 0.49; Chi== 39.73, df= 


6 (P < 0.00001); l= = 


85% 








Test for overall effect: Z = 


2.77 (P= 0.006) 
















1.1.2 Common side effect (2 %) 
















Berry 2004 


3.97 1.37 94 


2.61 


1.22 


94 


85.5% 


1.36 [0.99,1.73] 




■ 


Knapp 2004 


4.2 1.78 30 


2.6 


1.78 


30 


14.5% 


1.60 [0.70,2.50] 






Sulrtotal (95% CI) 
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Figure 3 Likelihood of occurrence. Meta-analysis on perceived likelihood of occurrence. 



comparing verbal terms other than those suggested in 
the 1998 European Commission's guidelines though. 
Some studies have asked patients to assign probability 
values to a range of different verbal frequency terms 
[17]. According to these studies, other words do not 
appear to be better suited to describe frequencies than 
those previously suggested by the European Commission. 
For example, in one study in a general practice setting, the 
terms "almost never" and "rarely" were associated with the 
lowest frequencies [18]. The probabilities assigned to these 
terms were still very high with 9.9% and 7.5%, respectively. 
Furthermore, the standard deviations in these studies were 
large, which is in line with our results and suggests a large 
variance in the frequencies assigned to different terms. 
This indicates that risk expressions should be tested for 
understanding before being routinely used. Furthermore, 
it suggests that there may be no verbal labels that are 
suited to convey frequencies, particularly of rare ad- 
verse effects. 

Even participants who received numerical information 
overestimated the risk of adverse effects. This is in line 
with other findings showing that people are generally 
poor at estimating risks [19]. Low numeracy in some of 
the patients may also explain this finding. In the UK, for 
example, one study suggested that one third of adults 
above the age of 50 had limited functional health literacy 



[20]. Another possible explanation for this finding is that 
patients may perceive their personal risk of experiencing 
adverse effects to be larger than average. 

People seem to be more satisfied with numerical pre- 
sentations and that they would be more likely to take 
the drugs or continue taking them. Participants also 
stated that they would be less affected in their decisions 
by numerical presentations. These outcomes were mea- 
sured on a 6-point Likert scale. Converting difference 
into percentages on the scale suggests changes between 
7% and 24%, which can be considered to be in the small 
to moderate, but important range. Most effects were also 
in the small to moderate range based on Cohen's inter- 
pretation, when converting effects into standardised 
mean differences. Some of the effects may be considered 
relatively large, since there is a tendency for people to 
avoid extreme answers on scales where extreme values 
are labelled in absolute terms, as it was the case in the 
studies included in this article [21]. 

Subgroup analyses suggested that combined verbal and 
numerical formats may dilute the effects of the numerical 
presentation on two outcomes, namely likelihood of oc- 
currence (as measure on a Likert scale) and impact on 
treatment decision. However, these results are based on a 
post-hoc analysis and comparison with the combined for- 
mat was restricted to a single trial with 100 participants. 
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Figure 4 Satisfaction with information. Meta-analysis on satisfaction with information. 
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Figure 6 Impact on decision. Meta-analysis on impact of information on (hypothetical) decision. 
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Challenges for providers of patient Information 

Providers of patient information often have a broad 
audience and face the problem that people have different 
preferences regarding the need and use of risk estimates. 
The meaning that is ascribed to such information varies 
greatly. While some express a clear need for risk esti- 
mates, others are confused by numbers and prefer to 
make decisions based on other types of information [22]. 
Different preferences imply that using a combined verbal 
and numerical format may be the best compromise to 
suit various needs. This is also reflected in the current 
European Commission Guideline on readability from 
2009 as well as the current EU template for patient leaf- 
lets [16,23]. Providing different information for different 
groups according to their preferences would be an op- 
tion, but it may be difficult to direct patients to the in- 
formation that best suits their needs. 

Unfortunately, data on adverse effects are often poorly 
reported in trials and systematic reviews, which compli- 
cates the issue [24,25]. Furthermore, there might still be a 
role for verbal terms in written information, for example 
for people with difficulties in understanding numbers, or 
when large amounts of numbers make information too 
difficult to comprehend. It is difficult to draw a clear rec- 
ommendation for providers of patient information as it is 
unlikely that there is a one-size-fits-all approach. This will 
depend on many other factors such as the context and the 
target group of the information. 

Limitations of the review 

Our review is based on a comprehensive search and 
used rigorous methods for assessing and synthesising 



the included studies. However, it has some limitations. 
We restricted our search to English and German stud- 
ies. It is reported in accordance with the PRISMA state- 
ment (Additional file 4). This may have introduced a 
language bias. We do not consider this to be a major 
weakness though, since it is questionable whether re- 
sults can be generalized from one language to another 
due to semantic differences. 

Limitations of the included studies 

Many studies were conducted with healthy volunteers 
and used fictional scenarios. There were some exceptions: 
one study included patients admitted to a cardiac rehabili- 
tation centre and produced similar results. Three studies 
of users of a patient information website partially included 
women with experiences of breast cancer. While they also 
produce similar results, these trials had some limitations, 
too. Some of the women in these studies were already 
taking the medication which was used in the scenario, 
which questions the applicability to other populations. 
An important caveat of all studies was that they used 
convenience samples, which may lack representativeness. 
Lasdy, all outcomes were measured as single items. This 
may be problematic for an outcome such as satisfaction, 
which represents a complex construct. However, informa- 
tion leaflets only differed in one sentence and the results 
for this outcome were very homogenous, adding strength 
to the findings. 



Conclusions 

Our review suggests that - 
treatment effects should 



whenever possible - adverse 
be quantified numerically. 
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because they lead to better estimates of risks. Verbal risk 
expressions should be tested for understanding before 
being routinely used. 

Further studies should focus on the impact of personal 
and contextual factors, including the setting, disease, nu- 
meracy and educational level. Furthermore they should 
use representative samples or be conducted in real life 
settings and measure potentially more relevant outcomes 
such as actual behavior (including decisions and medica- 
tion adherence for example) and whether decisions are 
in line with personal values. After all, risk communica- 
tion is not an end in itself, but a means to the end of 
making better decisions. On a more critical note, it is 
questionable whether a difference solely in how infor- 
mation on adverse effects is communicated could have 
a detectable effect on behavioral outcomes. A recent 
systematic review examined whether informing patients 
about benefits and harms of medicines compared to 
usual care has an impact on behavior at all [26]. Overall, 
the results did not show a significant effect. This system- 
atic review had some limitations including heteroge- 
neous results and statistical imprecision and there is 
some difficulty in interpreting the results. However, it 
suggests that we may need to focus on more general 
questions regarding the effects of provision of informa- 
tion on behavioral outcomes. 

A further unanswered question is how different for- 
mats for describing the frequency of adverse effects are 
interpreted when they are presented together with treat- 
ment benefits, since these are also often overestimated 
by patients [27]. Qualitative research methods may be 
able to shed some light into how people come to assign 
probabilities to words. On a final note, further research 
should be conducted within the framework of a system- 
atic review of the literature. 

Additional files 



Additional file 1: MEDLINE search strategy. 

Additional file 2: Characteristics of included studies. RCr= randomised 
controlled trial. 

Additional file 3: Probability estimates for different wordings. N/A = 

data not available. 

Additional file 4: PRIMSA Checklist. 



Competing interests 

All authors are purveyors and proponents of evidence based consumer 
health information. The authors did not receive any funding for this work 
apart from their salary. 

Authors' contributions 

RBB screened search results, extracted data, assessed studies for quality, 
performed the statistical analyses and drafted the manuscript DF screened 
search results, extraaed data and assessed study quality. AW extraaed data, 
assessed study quality and contributed to statistical analyses. MK designed 
the search strategy and performed searches. ME screened search results. All 



authors participated in the design of the study and critically revised the 
manuscript All authors read and approved the final manuscript 

Acknowledgements 

We thank Ulrich Grouven for his kind statistical advice and Stefan Lange for 
critically reviewing the manuscript. 

Received: 17 September 2013 Accepted: 20 August 2014 
Published: 26 August 2014 

References 

1. Edwards A, Elwyn G, Covey J, Matthews E Pill R: Presenting risk 
information-a review of the effects of "framing" and other 
manipulations on patient outcomes. J Health Commun 2001, 6:61-82. 

2. Gigerenzer G, Gaissmaier W, Kurz-Milcke E, Schwartz LM, Woloshin S: 
Helping doctors and patients to make sense of health statistics. Psychol 
Sci Public Interes 2007, 8:53-96. 

3. European Commission (EC): A Guideline on the Readability of the Label 
and Package Leaflet of IVIedicinal Products for Human Use. [http:// 
pharma.be/assets/flles/854/854_l 28901 376878944246.pdf] 

4. European Commission (EC): A Guideline on Summary of Product 
Characteristics (SmPC). [http://ec.europa.eu/health/flles/eudralex/vol-2/c/ 
smpc_guideline_rev2_en.pdf] 

5. Higgins JPT, Green S (Eds): Cochrane handbook for systematic reviews of 
interventions, [http://handbook.cochrane.org/] 

6. Higgins JP, Altman DG, G0tzsche PC, Juni P, Moher D, Oxman AD, Savovic J, 
Schuiz KE, Weeks L, Sterne JA, Cochrane Bias Methods Group; Cochrane 
Statistical Methods Group: The Cochrane collaboration's tool for assessing 
risk of bias in randomized trials. BMJ 201 1, 343:d5928. 

7. loannidis JP, Patsopoulos NA, Rothstein HR: Reasons or excuses for 
avoiding meta-analysis in forest plots. BMJ 2008, 336:1413-1415. 

8. Knapp P, Gardner PH, Carrigan N, Raynor DK, Woolf E: Perceived risk of 
medicine side effects in users of a patient information website: a study 
of the use of verbal descriptors, percentages and natural frequencies. 
Br J Health Psychol 2009, 14:579-594. 

9. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group: Preferred 
reporting items for systematic reviews and meta-analyses: the PRISMA 
statement. BMJ 2009, 339:b2535. 

10. Berry DC, Knapp PR, Raynor T: Is 15 per cent very common? Informing 
people about the risks of medication side effects, int J Pharm Pract 2002, 
10:145-151. 

1 1 . Berry DC, Raynor DK, Knapp P: Communicating risk of medication side 
effects: an empirical evaluation of EU recommended terminology. 
Psychol Health Med 2003, 8:251-263. 

12. Berry D, Raynor T, Knapp P, Bersellini E: Over the counter medicines and 
the need for immediate action: a further evaluation of European 
commission recommended wordings for communicating risk. Patient 
EducCouns 2004,53:129-134 

13. Berry DC, Hochhauser M: Verbal labels can triple perceived risk in clinical 
trials. Drug Inform J 2006, 40:249-258. 

14. Knapp P, Raynor DK, Berry DC: Comparison of two methods of presenting 
risk information to patients about the side effects of medicines. Qual Saf 
Health Care 2004, 13:176-180 

15. Knapp P, Raynor DK Woolf E Gardner PH, Carrigan N, McMillan B: 
Communicating the risk of side effects to patients: an evaluation of UK 
regulatory recommendations. Drug Saf 2009, 32:837-849. 

16. European Commission (EC): Guideline on the Readability of the Labelling 
and Package Leaflet of Medicinal Products for Human Use. [http://ec. 
europa.eu/health/flles/eudralex/vol-2/c/2009_01_12_readability_guideline_ 
final_en.pdf] 

1 7. Eiser JR: Communication and interpretation of risk. Br Med Bull 1 998, 

54:779-790. 

18. Woloshin KK Ruffin MT 4th, Gorenflo DW: Patients' interpretation of 
qualitative probability statements. Arch Fam Med 1994, 3:961-966. 

19. Lichtenstein S, Slovic P, Fischhofl" B, Layman M, Combs B: Judged 
frequency of lethal events. Exp Psychol Hum Learn Memory 1978, 
4:551-578. 

20. Bostock S, Steptoe A: Association between low functional health literacy 
and mortality in older adults: longitudinal cohort study. Brit Med J 2012, 
344:el602. 



Buchter et al. BMC Medical Informatics and Decision Making 2014, 14:76 
http://www.biomedcentral.com/1472-6947/14/76 



Page 11 of 1 1 



21 . Streiner DL, Norman GR: Health Measurement Scales: A Practical Guide for 
their Deveiopment and Use. New York: Oxford University Press; 2008. 

22. Fisseni G, Lewis DK, Abliolz HH: Understanding the concept of medical 
risl< reduction: a comparison between the UK and Germany. Eur J Gen 
Pract 2008, 14:109-116. 

23. European Medicines Agency (EMA): Quaiity Review of Documents Human 
Product-information Annotated Template (English) Version 9. [http://www. 
ema.europa.eu/ema/index.jsp?curl=pages/regulation/document_listing/ 
document_listing_0001 34.jsp] 

24. Cornelius VR, Perrio IVIJ, Shakir SA, Smith LA: Systematic reviews of adverse 
effects of drug interventions: a survey of their conduct and reporting 
quality. Pharmacoepidemiol Drug Saf 2009, 18:1223-1231. 

25. loannidis JP, Lau J: Completeness of safety reporting in randomized trials: 
an evaluation of 7 medical areas. J Amer Med /\550C 2001, 285:437-443. 

26. Crockett RA, Sutton S, Walter FM, Clinch M, IVlarteau TM, Benson J: Impact 
on decisions to start or continue medicines of providing information to 
patients about possible benefits and/or harms: a systematic review and 
meta-analysis. Med Decis Making 201 1, 31:767-777. 

27. Hamrosi K, Dickinson R, Knapp P, Raynor DK, Krass I, Sowter J, Aslani P: It's 
for your benefit: exploring patients' opinions about the inclusion of 
textual and numerical benefit information in medicine leaflets, int J 
Pharm Pract 2013, 21:216-225. 



doi:l 0.1 1 86/1 472-6947-1 4-76 

Cite this article as: Buchter et al: Words or numbers? Communicating 
risk of adverse effects in written consumer health information: a 
systematic review and meta-analysis. BMC Medical Informatics and 
Decision Making 2014 14:76. 



Submit your next manuscript to BioMed Central 
and take full advantage of: 

• Convenient online submission 

• Thorough peer review 

• No space constraints or color figure charges 

• Immediate publication on acceptance 

• Inclusion in PubMed, CAS, Scopus and Google Scholar 

• Research which is freely available for redistribution 



Submit your manuscript at \ rant,,\ 

www.biomedcentral.com/submit Biomea eencrai 



